refactor: fuse set_world_poses/set_scales into single _compose_fabric_transform#5674
refactor: fuse set_world_poses/set_scales into single _compose_fabric_transform#5674pv-nvidia wants to merge 2 commits into
Conversation
8b5f661 to
2ca6310
Compare
There was a problem hiding this comment.
🔍 Code Review Summary
This PR successfully refactors FabricFrameView to consolidate duplicate kernel-launch logic into a single _compose_fabric_transform method and addresses a critical assertion bug. The changes are well-structured and improve both correctness and maintainability.
✅ Correctness
No blocking issues found.
The refactoring maintains behavioral equivalence:
_compose_fabric_transformcorrectly handlesNoneparameters via empty arrays, preserving the kernel's ability to skip unspecified components- The
_fabric_usd_sync_doneflag is set correctly after writes - The
assert→RuntimeErrorchange in_rebuild_fabric_arraysis a valuable fix — assertions are stripped underpython -O, which could silently produce wrong poses or out-of-bounds kernel indices in production
Minor observation:
- The
_compose_fabric_transformmethod always calls_fabric_hierarchy.update_world_xforms()even when only scales are updated. This matches the originalset_scalesbehavior, so no regression, but worth noting for future optimization.
⚡ Performance
Net positive impact:
-
Reduced
PrepareForReusecalls — The critical fix:_sync_fabric_from_usd_oncenow invokesPrepareForReuseexactly once (via the fused compose) instead of twice. This prevents potential topology-change signal masking. -
Single kernel launch for initial sync — USD→Fabric sync now uses one
wp.launchinstead of two, reducing kernel launch overhead.
Micro-optimization opportunity (non-blocking):
# Current: creates empty arrays every call
empty3 = wp.zeros((0, 3), dtype=wp.float32, device=self._device)
empty4 = wp.zeros((0, 4), dtype=wp.float32, device=self._device)These could be cached as class-level constants, but the overhead is negligible for zero-sized arrays.
📝 Maintainability
Excellent improvements:
- Code deduplication — ~40 lines of duplicate kernel-launch boilerplate consolidated into one reusable method
- Clearer naming —
empty3/empty4communicates intent better thandummy3/dummy4 - Good documentation — The
_compose_fabric_transformdocstring clearly explains the single-PrepareForReuseguarantee and the_fabric_usd_sync_doneside effect - Test modernization —
wp.to_torch(fab_pos)→fab_pos.torchuses the current ProxyArray API
Changelogs are well-formatted and accurately describe both the behavioral changes and the assert fix.
🔗 Service Locator (Bundled Feature)
The typed service locator in SimulationContext is a clean pattern:
- Type-safe via
TypeVar - Automatic
close()onclear_instance() - Good test coverage for edge cases (re-registration, missing close method, etc.)
One consideration: set_service does not auto-close the replaced service (caller responsibility per docstring). This is documented but could be a subtle footgun. Consider adding an optional auto_close=True parameter in the future if replacement becomes common.
Verdict: Approve ✅
The refactoring achieves its goals cleanly:
- Eliminates duplicate code
- Fixes a real bug (assert under
-O) - Reduces PrepareForReuse call count
- Adds useful infrastructure (service locator)
All 36 existing Fabric tests pass per PR description. Ship it! 🚀
1b12c6c to
aad8a83
Compare
…factory
FabricFrameView had an internal _use_fabric flag that fell back to
UsdFrameView when Fabric was disabled or the device was unsupported.
This violated single-responsibility: FabricFrameView pretended to be
one class but sometimes behaved as another.
Now the FrameView factory handles all dispatch:
- PhysX + Fabric enabled + supported device → FabricFrameView
- PhysX without Fabric (or unsupported device) → UsdFrameView
- Newton → NewtonSiteFrameView
FabricFrameView no longer checks _use_fabric or _fabric_supported_devices.
It assumes Fabric is available (the factory guarantees this).
UsdFrameView is eagerly registered on the factory since it lives in
isaaclab (not a backend package), so FactoryBase's dynamic import
(isaaclab_{backend}.sim.views) can't discover it.
Extract _compose_fabric_transform() to deduplicate the kernel-launch logic shared by set_world_poses and set_scales. The initial USD->Fabric sync now composes position, orientation, and scale in one call, so PrepareForReuse is invoked exactly once per logical update. Also replace assert with RuntimeError in _rebuild_fabric_arrays so the topology-change guard survives python -O.
aad8a83 to
b0206ce
Compare
There was a problem hiding this comment.
🔍 Code Review Summary
This PR successfully refactors FabricFrameView to consolidate duplicate kernel-launch logic into a single _compose_fabric_transform method and moves Fabric/USD dispatch to the FrameView factory. The changes improve both correctness and maintainability.
✅ Correctness
No blocking issues found.
The refactoring maintains behavioral equivalence:
_compose_fabric_transformcorrectly handlesNoneparameters via empty arrays, preserving the kernel's ability to skip unspecified components- The
_fabric_usd_sync_doneflag is correctly set after writes - The
assert→RuntimeErrorchange in_rebuild_fabric_arraysis a valuable fix — assertions are stripped underpython -O, which could silently produce wrong poses or out-of-bounds kernel indices in production - Factory dispatch logic in
FrameView._get_backend()correctly checks bothfabricEnabledsetting and device support before routing toFabricFrameView
Minor observation:
- The
_compose_fabric_transformmethod always calls_fabric_hierarchy.update_world_xforms()even when only scales are updated. This matches the originalset_scalesbehavior, so no regression, but worth noting for future optimization.
⚡ Performance
Net positive impact:
-
Reduced
PrepareForReusecalls — The critical fix:_sync_fabric_from_usd_oncenow invokesPrepareForReuseexactly once (via the fused compose) instead of twice. This prevents potential topology-change signal masking. -
Single kernel launch for initial sync — USD→Fabric sync now uses one
wp.launchinstead of two, reducing kernel launch overhead.
Micro-optimization opportunity (non-blocking):
# Current: creates empty arrays every call
empty3 = wp.zeros((0, 3), dtype=wp.float32, device=self._device)
empty4 = wp.zeros((0, 4), dtype=wp.float32, device=self._device)These could be cached as class-level constants, but the overhead is negligible for zero-sized arrays.
📝 Maintainability
Excellent improvements:
- Code deduplication — ~40 lines of duplicate kernel-launch boilerplate consolidated into one reusable method
- Single-responsibility —
FabricFrameViewno longer contains dispatch logic; it assumes Fabric is available (factory guarantees this) - Clearer naming —
empty3/empty4communicates intent better thandummy3/dummy4 - Good documentation — The
_compose_fabric_transformdocstring clearly explains the single-PrepareForReuseguarantee and the_fabric_usd_sync_doneside effect - Test modernization —
wp.to_torch(fab_pos)→fab_pos.torchuses the current ProxyArray API
Changelogs are well-formatted and accurately describe both the behavioral changes and the assert fix.
🏭 Factory Dispatch (frame_view.py)
The dispatch logic is clean and well-structured:
FrameView._get_backend()checksSimulationContextfor physics manager type- PhysX path now explicitly checks
fabricEnabledsetting and device compatibility UsdFrameViewis eagerly registered since it lives inisaaclab, not a backend package- Good warning message when Fabric is enabled but device is unsupported
One nit (non-blocking): The _FABRIC_SUPPORTED_DEVICES tuple could potentially be shared between frame_view.py and any other code that needs it (currently duplicated between factory and docs), but this is minor.
⚠️ CI Status
The "Check changelog fragments" job is failing. Please verify the changelog fragment format matches project conventions.
Verdict
The refactoring achieves its goals cleanly:
- ✅ Eliminates duplicate kernel-launch code
- ✅ Fixes a real bug (assert stripped under
-O) - ✅ Reduces PrepareForReuse call count in initial sync
- ✅ Moves dispatch responsibility to the factory (single-responsibility principle)
- ✅ All 36 existing Fabric tests pass per PR description
Once the changelog CI check passes, this is ready to merge. 🚀
# Description Removes the `cuda:0`-only restriction in `FabricFrameView`. USDRT `SelectPrims` now accepts any CUDA device index, so Fabric acceleration runs on the simulation device (e.g., `cuda:1`) instead of silently falling back to the slower USD path. This unblocks distributed training where each process is pinned to a specific GPU. Changes: - **Drop device allowlist.** Removes `_fabric_supported_devices`, the device guard in `__init__`, and the corresponding assertion in `_initialize_fabric`. Any CUDA device (or CPU) now works. - **Multi-GPU test coverage.** Three `cuda:1`-parameterized tests gated by `ISAACLAB_TEST_MULTI_GPU=1` env var, plus a dedicated CI workflow on the multi-GPU runner that sets it. - **Fix deprecated `wp.to_torch()` calls.** Replaced with `.torch` accessor on ProxyArray (avoids DeprecationWarning). - **TODOs for follow-up PRs.**: - #5673 - #5674 ## Type of change - New feature (non-breaking change which adds functionality) `cuda:0` continues to work exactly as before; `cuda:1`+ now also works instead of silently falling back to USD. No public API surface changed. ## Checklist - [x] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there > Note: this PR uses a fragment file at `source/isaaclab_physx/changelog.d/feat-frame-view-enable-mgpu.rst` per the fragment-based changelog system. ## Test plan Three new tests gated by `ISAACLAB_TEST_MULTI_GPU=1` and parameterized with `["cuda:1"]`: - `test_fabric_cuda1_world_pose_roundtrip` — `set_world_poses` → `get_world_poses` returns the same values on a non-primary CUDA device. - `test_fabric_cuda1_no_usd_writeback` — Fabric writes on `cuda:1` do not write back to USD. - `test_fabric_cuda1_scales_roundtrip` — covers the `set_scales` write path on `cuda:1`. A dedicated CI workflow (`test-fabric-multi-gpu.yaml`) runs on the `[self-hosted, linux, x64, gpu, multi-gpu]` runner with `ISAACLAB_TEST_MULTI_GPU=1` set. Pre-flights with `nvidia-smi` and `torch.cuda.device_count()`, fails loudly if the runner has < 2 GPUs. To verify locally on a multi-GPU machine: ```bash ISAACLAB_TEST_MULTI_GPU=1 ./isaaclab.sh -p -m pytest \ source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py -v ``` To verify the `cuda:0` path is unchanged (multi-GPU tests auto-skip): ```bash ./isaaclab.sh -p -m pytest \ source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py -v ```
|
Superseded by the consolidated PR #5728 (pv/fabric-full-stack). |
Problem
FabricFrameViewhad duplicated kernel-launch logic inset_world_posesandset_scales, and the initial USD→Fabric sync called both methods sequentially. This meant:PrepareForReuse— the initial USD→Fabric sync in_sync_fabric_from_usd_oncecalledset_world_posesthenset_scales, each invokingPrepareForReuse. A second non-idempotentPrepareForReusecall could mask a topology-change signal that should have triggered a fabricarray rebuild.Solution
Extract
_compose_fabric_transform(positions=None, orientations=None, scales=None, indices=None)— a single method that composes any subset of transform components into one kernel launch. Components left asNoneare skipped via empty arrays.set_world_poses→ delegates to_compose_fabric_transform(positions=..., orientations=...)set_scales→ delegates to_compose_fabric_transform(scales=...)_sync_fabric_from_usd_once→ single fused call with all three componentsAdditional fix
The topology-change invariant guard in
_rebuild_fabric_arraysusedassert, which is stripped underpython -O. Replaced withraise RuntimeErrorso it's always active.Tests
All 36 existing Fabric tests pass (+ 2 xfail).
Merge Order
This PR is part of a stacked series for Fabric-accelerated local poses:
this depends on